#################################################################################################################################
# README to accompany code submission for investor analysis elements of:
#
# At the Top of the Mind:  Peak Prices and the Disposition Effect, by E. Quispe-Torreblanca, D. Hume, J. Gathergood,  G. Loewenstein, and N. Stewart, forthcoming (2025) at Journal of Political Economy Microeconomics
#
##################################################################################################################################
#
#
#                      Overview of the Raw Data Structure and Content
#
# The data for this study were provided free of charge by Barclays Stockbroking, an execution-only online brokerage service operating in the United Kingdom. 
# This dataset spans from April 2012 to March 2016 and comprises daily-level records of trades and quarterly-level records of portfolio positions.  
# Additionally, the data set includes a daily-level dummy variable that indicates whether the investor logged into their account on each day. 
# Researchers interested in accessing these data are advised to contact Barclays Data Hub team via  xrabarclaysdataservi@barclays.com.
#
#
# Our raw data is organized into three main directories:
#
# The first folder (output--transactions_20160826) contains individual files for each investor, detailing every transaction they have made. 
# The second folder (output--valuations_holdings) provides information about the portfolios held by each investor on specific days, which may or may not coincide with transaction dates. 
# This folder displays the positions held in various stocks by each investor on those particular days. 
# The third folder provides information on each investor's login activity, detailing all the days each investor has logged in (output--logins).
# Similar to the first, the second and third folders maintain separate files for each investor.
#
# Additionally, we have the following files:
# - Demographics: 
#     'anonymous_customer_level_data_merged_master.csv' (all accounts)
#     'anonymous_customer_level_data_merged_master_login.csv' (including only accounts with postcodes)
#
# - Price data:
#     'all_sedols_isin_matching.csv' (matching table between ISIN and SEDOL codes, which has been retrieved from Datastream)
#     'datastream_20180601.csv' (price data for each stock, which has been retrieved from Datastream)
#     'ftse100 return.csv' (FTSE100 returns)
#     '260118 isins sedols recovered static and industry.csv' (sedols with industry information.)
#
#
#################################################################################################################################

#################################################################################################################################
#
#					R Codes 
#
# To ensure complete reproducibility, the following codes should be run in the specified sequence. 
# we have included the location paths of each folder containing the raw data within our virtual machines.
# 
# 1. Cleaning transaction data (cleaning_data1.R)
#
# This script cleans our transaction data, focusing especially on new accounts. Accounts that were opened after April 1, 2012, are classified as "new accounts" in our data.
# The key function of this code is to compute the Quantity Weighted Average Purchase Price (QWAPP) for each transaction day.
#
# To enhance efficiency, we executed the script on two separate virtual machines. 
# The first machine processed the data for all investor accounts with an 'anon' identifier less than 100000, anon<100000, while the second machine took over for those anon>=100000
# Here, 'anon' refers to the unique identifier assigned to each investor.
#
# Once processed, the cleaned data files were stored in the folder titled 'new_acc_clean_Tr'. Each 'anon' has an individual file within this folder.
#
#
# 2. Computing portfolio data for login days (cleaning_data2.R)
#
# The 'new_acc_clean_Tr' folder created with 'cleaning_data1.R' houses portfolio data specific to each investor account on their respective transaction days. 
# This script's aim is to generate portfolio data for each account on all their login days.
# Login days are in stored in a separate folder (output--logins)
#
# This script applies a particular function (extract_port_on_many_days) to each account in 'new_acc_clean_Tr' folder, integrating their information with login dates. 
# The result is a long panel that compiles portfolio data for every 'anon' account on all login days, which also include the transaction days. 
# In the  processed data, each observation uniquely represents a combination of a stock, date, and account.
#
# The long panel generated by applying the above function is stored in the file named "310319sell_day_portfolios_newacc_BIG_1_to_10000.csv". 
# (Note that to expedite the processing, we have divided the workload between two virtual machines, as explained in the "cleaning_data1.R" file. 
# The name of the large file being processed on the second virtual machine using this code is "310319sell_day_portfolios_newacc_BIG_10000_to_more.csv")
#
#
# 3. Portfolio data processing and calculation of peak prices (cleaning_data3.R)
#
# This script loads the panel data generated by 'cleaning_data2.R'. 
# Previously, we distributed the task across two virtual machines. Now, we merge these two datasets and compute the necessary variables for the regression analyses featured in the paper.
#
# Output Files:
#
# - Cleaned data where each observation represents an account, stock, and date:
#   - `data_peak_prices_cleaned.csv`
#
# - Cross-sectional data where each observation represents an account:
#   - `data_peak_cross_sec_acc.csv`
#
# - Peak prices data where the peak is the highest price each investor has experienced since the purchase:
#   - `MAXC_peak_5update.csv` (using a minimum of 5 business days to update the peak)
#   - `MAXC_peak_20update.csv` (using a minimum of 20 business days to update the peak)
#
# - Peak prices determined within a one-year window:
#   - `peak_past_year_5update.csv` (using a minimum of 5 business days to update the peak)
#   - `peak_past_year_20update.csv` (using a minimum of 20 business days to update the peak)
#
# 4. Recreating Tables and Figures (`processing_data_and_regressions.R`)
#  
# The first part of this script loads the datasets generated by 'cleaning_data3.R'. 
# The second part of this script executes regressions and recreates all tables and figures presented in the paper.
# The code specify which tables are produced by specific sections of the code.
#
# The code calls other scripts for specific analyses. Included Scripts:
#
# - `excess_returns.R` calculates excess post-sale returns relative to FTSE100. Additionally, it examines returns based on asset volatility. Other robustness checks include:
#    1) Analysis of the purchase and peak price disposition effects by splitting the sample into ten deciles based on returns since purchase.
#    2) Testing investor responses to various gain measures.
#    3) Investigating how the timing of peaks influences selling decisions.
#
# - `hist_plots.R` computes several histograms of returns.
#
# - `mechanism.R` performs analyses of interactions with market performances.
#   It presents estimates from subsamples defined by market gains/losses since purchase and since the peak price event.
#   Additionally, it tests for rebalancing by restricting the dependent variable to indicate complete sales only.
#
# - `top_up.R` explores the behavior of investors making additional purchases (top-ups) of stocks.
#
# - `cox_model.R` estimates a stratified Cox proportional hazard model with time-varying covariates.

# - `top_up_placebo.R` studies the likelihood of investors topping up their current positions in cases where the investor did and did not hold the stock during the peak price.
#
#################################################################################################################################

#################################################################################################################################
#
# Required R Packages
#
# The analyses were conducted using R version 4.3.1. The following packages are required:
#
# - `devtools`
# - `data.table`
# - `statar`
# - `dplyr`
# - `lubridate`
# - `zoo`
# - `bizdays`
# - `logr`
# - `ggplot2`
# - `stargazer`
# - `stringi`
# - `magrittr`
# - `Hmisc`
# - `lfe`
# - `xtable`
# - `Rmisc`
# - `gmodels`
# - `stringr`
# - `survival`
# - `readr`
# - `speedglm`
# - `tidyr`
# - `ggpubr`
# - `cowplot`
# - `scales`
#
# Most packages can be installed from CRAN using `install.packages()`. For `starpolishr`, install via GitHub:
#
# devtools::install_github("ChandlerLutz/starpolishr")
#
#################################################################################################################################


#################################################################################################################################
# Output Identification
# 
#
# Below is a detailed list of all tables and figures produced by our codes. This list includes references to the 
# corresponding file names, which could be either .tex or .pdf.

# Tables and Corresponding Files using the investor data.

# FIGURES:

# Figure 1: Examples of Peak Prices
# figures/example_restriction_peaks_week.pdf

# Figure 2 (uses housing data rather than investor data, not produced by these scripts).

# Figure 3: Probability of Stock Sale, Returns Since Purchase and Returns Since Peak
# figures/patter_DE_purch_MAXC_peak_5update_sellsample.pdf
# figures/patter_DE_since_MAXC_peak_5update_sellsample.pdf
# figures/patter_DE_interaction_MAXC_peak_5update_sellsample.pdf

# Figure 4: Examples of Stock Price Trajectories for the Placebo Analysis
# figures/hold_gainpeak_gainpurch.pdf
# figures/nohold_gainpeak_gainpurch.pdf

# Figure 5: Stock Top-Ups, Returns Since Purchase and Return Since Peak
# figures/TOPUP_basic_DE_purch_MAXC_peak_5update_loginsample.pdf
# figures/TOPUP_DE_since_MAXC_peak_5update_loginsample.pdf

# TABLES:

# Table 1 to 4 (use housing data rather than investor data, not produced by these scripts).

# Table 5: Purchase and Peak Price Disposition Effect for Stocks: OLS Estimates
# tables/clean_OLS_MAXC_peak_5update_sellsample.tex

# Table 6: Purchase and Peak Price Disposition Effects for Stocks: Splitting the Sample into Ten Deciles Based on Returns Since Purchase
# tables/clean_OLS_deciles_sellsample

# Table 7 (uses housing data rather than investor data, not produced by these scripts).

# Table 8: Purchase and Peak Price Disposition Effects for Stocks: Controlling for Alternative Reference Points
# tables/controling_alternative_rp_sell_sample.tex

# Table 9: Purchase and Peak Price Disposition Effects for Stocks: Impact of Peak Timing
# tables/peak_timing_wide_sell_sample.tex

# Table 10: Estimates of the Stocks Disposition Effect, Placebo Analysis
# tables/clean_OLS_noholding_past_year_peak_5update_sellsample.tex
# tables/clean_OLS_holding_past_year_peak_5update_sellsample.tex

# Table 11: Ex Post Returns for Stocks 
# tables/excess_returns.tex
# tables/excess_returns_peak.tex

# APPENDIX FIGURES:

# Figure A1: Illustration of the Model of Multiple Reference Points
# Note: Hypothetical example, not based on actual data

# Figure A2 to A12 (use housing data rather than investor data, not produced by these scripts).

# Figure A13: Stock Returns Since Purchase and Returns Since Peak (Week Definition)
# figures/hist_retsincepur_MAXC_peak_5update_sellsample.pdf
# figures/hist_retsincepeak_MAXC_peak_5update_sellsample

# Figure A14: Example of Peak Prices, Stock Data, Month Definition
# figures/example_restriction_peaks_month.pdf

# Figure A15: Stock Returns Since Purchase and Returns Since Peak (Month Definition)
# figures/hist_retsincepur_MAXC_peak_20update_sellsample.pdf
# figures/hist_retsincepeak_MAXC_peak_20update_sellsample

# Figure A16: Probability of Stock Sale, Returns Since Purchase and Returns Since Peak (Month Definition)
# figures/patter_DE_purch_MAXC_peak_20update_sellsample.pdf
# figures/patter_DE_since_MAXC_peak_20update_sellsample.pdf
# figures/patter_DE_interaction_MAXC_peak_20update_sellsample.pdf

# Figure A17: Stock Returns Since Purchase and Returns Since Peak, Login-Day Sample
# figures/hist_retsincepur_MAXC_peak_5update_loginsample.pdf
# figures/hist_retsincepeak_MAXC_peak_5update_loginsample.pdf

# Figure A18: Probability of Stock Sale, Returns Since Purchase and Returns Since Peak in the Login-Day Sample
# figures/patter_DE_purch_MAXC_peak_5update_loginsample.pdf
# figures/patter_DE_since_MAXC_peak_5update_loginsample.pdf
# figures/patter_DE_interaction_MAXC_peak_5update_loginsample.pdf

# Figure A19: Probability of Stock Sale, Returns Since Purchase, Returns Since Peak, and Returns Since Last Login Day
# figures/patter_DE_triple_interaction_MAXC_peak_5update_sellsample.pdf

# APPENDIX TABLES:

# Table A1: Example of Trading Strategies for Different Reference Points With Prospect Theory Preferences
# Note: Hypothetical example, not based on actual data.

# Table A2 and A3: Sample Selection Steps for Each Dataset
# Note: Selection steps described in the paper with observation counts manually inputted.

# Table A4 and A5 (use housing data rather than investor data, not produced by these scripts).

# Table A6: Stockbroking Accounts Sample Summary Statistics
# tables/stats_account_MAXC_peak_5update_loginsample.tex

# Table A7 to A9 (use housing data rather than investor data, not produced by these scripts).

# Table A10: Summary Statistics for Stock Returns Since Purchase and Returns Since Peak Price
# tables/clean_summary_stats_returns_MAXC_peak_5update_sellsample.tex

# Table A11 and A12 (uses housing data rather than investor data, not produced by these scripts).

# Table A13: Summary Statistics for Stock Returns Since Purchase and Returns Since Peak Price (Month-Peak)
# tables/clean_summary_stats_returns_MAXC_peak_20update_sellsample.tex

# Table A14 to A24: Uses housing data rather than investor data, not produced by these scripts.

# Table A25: Estimates of the Disposition Effect for Stocks Including Portfolio and Demographic Controls (Month-Peak)
# tables/full_control_MAXC_peak_20update_sell.tex

# Table A26: Estimates of the Disposition Effect for Stocks Including Portfolio and Demographic Controls (Login-Days)
# tables/full_control_MAXC_peak_5update_login.tex

# Table A27 to A30 (use housing data rather than investor data, not produced by these scripts).

# Table A31: Purchase and Peak Price Disposition Effects for Stocks: Individual Fixed Effects Estimates
# tables/clean_FE_MAXC_peak_5update_sellsample.tex

# Table A32: Purchase and Peak Price Disposition Effects for Stocks: Including Continuous Returns Since Purchase and Since Peak Price
# tables/clean_OLS_returns_MAXC_peak_5update_sellsample.tex

# Table A33: Purchase and Peak Price Disposition Effects for Stocks: Including Portfolio and Demographic Controls
# tables/full_control_MAXC_peak_5update_sell.tex

# Table A34: Cox Proportional Hazard Model Estimates of the Stocks Disposition Effect (Week-Peak)
# tables/cox_MAXC_peak_5update_sellsample.tex

# Table A35: Estimates of the Stocks Disposition Effect Excluding Partial Sells
# tables/clean_rebalancing_MAXC_peak_5update_sellsample.tex

# Table A36: The Stocks Disposition Effect: Sub-Sample Analysis, Sell-Day Sample, Week-Peak
# tables/clean_ftse_up_sell.tex
# tables/clean_ftse_down_sell.tex
# tables/clean_days_since_peak_sell1.tex
# tables/clean_days_since_peak_sell2.tex
# tables/clean_days_since_purchase_sell1.tex
# tables/clean_days_since_purchase_sell2.tex

# Table A37: The Stocks Disposition Effect: Demographics Sub-Sample Analysis, Sell-Day Sample, Week-Peak
# tables/clean_femalesell.tex
# tables/clean_malesell.tex
# tables/clean_age_sell1.tex
# tables/clean_age_sell2.tex
# tables/clean_tenure_sell1.tex
# tables/clean_tenure_sell2.tex
# tables/clean_PV_sell1.tex
# tables/clean_PV_sell2.tex
# tables/clean_stocks_sell1.tex
# tables/clean_stocks_sell2.tex

# Table A38: Estimates of the Stocks Disposition Effect Sub-samples by FTSE100 Returns Since Purchase (Week-Peak), Sell-Day Sample
# tables/clean_ftse1_MAXC_peak_5update_sellsample.tex

# Table A39: Estimates of the Stocks Disposition Effect Sub-samples by FTSE100 Returns Since Purchase (Week-Peak), Sell-Day Sample
# tables/clean_ftse2_MAXC_peak_5update_sellsample.tex

# Table A40: Stock Top-Up Behavior When Stocks Are in Loss Since Purchase: OLS and Individual Fixed Effects Estimates
# tables/clean_TOPUP_returns_since_purchase_MAXC_peak_5update_loginsample.tex

# Table A41: Top-Up Behavior When Stocks Are in Loss Since Past Peak Price: OLS and Individual Fixed Effects Estimates
# tables/clean_TOPUP_returns_since_peak_MAXC_peak_5update_loginsample.tex

# Table A42: Top-Up Behavior When Stocks Are in Loss Since Past Peak Price: Placebo Test (Peaks Defined for the Past Year)
# tables/clean_TOPUP2_returns_since_peak_past_year_peak_5update_loginsample.tex

# Table A43: Average Returns
# tables/av_returns.tex
# tables/av_returns_peak.tex

# Table A44: Average Returns by Frequency of Peaks
# tables/av_returns_low.tex
# tables/av_returns_peak_low.tex
# tables/av_returns_high.tex
# tables/av_returns_peak_high.tex

# Table A45: Ex Post Returns by Frequency of Peaks
# tables/excess_returns_low.tex
# tables/excess_returns_peak_low.tex
# tables/excess_returns_high.tex
# tables/excess_returns_peak_high.tex
#################################################################################################################################


